This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
library("nycflights13")
library("tidyverse")
nycflights13::flights
View(flights)
?flights
jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)
If you want to determine if a value is missing, use is.na()
filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA values. If you want to preserve missing values, ask for them explicitly
- to ask explicitly, use comment notation to do this in parenthesis after command
Exercises 5.2:
filter(flights, arr_delay >= 120) #10,200 flights
filter(flights, dest == "IAH" | dest == "HOU") #9,313 flights
airlines
filter(flights, carrier == "DL" | carrier == "AA" | carrier == "UA") #139,504 flights
filter(flights, month >= 7, month <= 9) #86,326 flights
#with between fcn could do as
filter(flights, between(month, 7, 9))
summary(flights$dep_time)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 907 1401 1349 1744 2400 8255
filter(flights, dep_time <= 600 | dep_time == 2400) #8,255 flights
filter(flights, is.na(dep_time))
5.3 arrange() works similarly to filter() except that instead of selecting rows, it changes their order
arrange(flights, year, month, day)
Use desc() to re-order by a column in descending order and missing values are always sorted at the end
5.3 Exersices:
arrange(flights, is.na(desc(dep_delay)))
arrange(flights, desc(is.na(dep_time)), dep_time)
arrange(flights, desc(dep_delay)) #longest to shortest
arrange(flights, dep_delay) #shortest to longest
arrange(flights, desc(distance)) #4983 miles
arrange(flights, distance) #17 miles
5.4 select() allows you to rapidly zoom in on a useful subset using operations based on the names of the variables
b/c some data sets have hundreds or thousands of variables and not all are necessary for analysis–use select to just look at and work with the ones that are
starts_with(“abc”): matches names that begin with “abc”. ends_with(“xyz”): matches names that end with “xyz”. contains(“ijk”): matches names that contain “ijk”.
5.4 Exercises
select(flights, "dep_time", "dep_delay", "arr_time", "arr_delay")
select(flights, dep_time, dep_delay, arr_time, arr_delay)
select(flights, 4, 6, 7, 9) #this one uses column numbers of the variables
select(flights, starts_with("dep_"), starts_with("arr_"))
5.5 useful to add new columns that are functions of existing columns , for this we can use mutate()
flights_sml <- select(flights,
year:day,
ends_with("delay"),
distance,
air_time
)
mutate(flights_sml,
gain = dep_delay - arr_delay,
speed = distance / air_time * 60
)
5.5 Exercises
arr_time
Error: object 'arr_time' not found
5.6 summarise () groups key data into a single row
summary(flights)
year month day dep_time sched_dep_time
Min. :2013 Min. : 1.000 Min. : 1.00 Min. : 1 Min. : 106
1st Qu.:2013 1st Qu.: 4.000 1st Qu.: 8.00 1st Qu.: 907 1st Qu.: 906
Median :2013 Median : 7.000 Median :16.00 Median :1401 Median :1359
Mean :2013 Mean : 6.549 Mean :15.71 Mean :1349 Mean :1344
3rd Qu.:2013 3rd Qu.:10.000 3rd Qu.:23.00 3rd Qu.:1744 3rd Qu.:1729
Max. :2013 Max. :12.000 Max. :31.00 Max. :2400 Max. :2359
NA's :8255
dep_delay arr_time sched_arr_time arr_delay
Min. : -43.00 Min. : 1 Min. : 1 Min. : -86.000
1st Qu.: -5.00 1st Qu.:1104 1st Qu.:1124 1st Qu.: -17.000
Median : -2.00 Median :1535 Median :1556 Median : -5.000
Mean : 12.64 Mean :1502 Mean :1536 Mean : 6.895
3rd Qu.: 11.00 3rd Qu.:1940 3rd Qu.:1945 3rd Qu.: 14.000
Max. :1301.00 Max. :2400 Max. :2359 Max. :1272.000
NA's :8255 NA's :8713 NA's :9430
carrier flight tailnum origin
Length:336776 Min. : 1 Length:336776 Length:336776
Class :character 1st Qu.: 553 Class :character Class :character
Mode :character Median :1496 Mode :character Mode :character
Mean :1972
3rd Qu.:3465
Max. :8500
dest air_time distance hour
Length:336776 Min. : 20.0 Min. : 17 Min. : 1.00
Class :character 1st Qu.: 82.0 1st Qu.: 502 1st Qu.: 9.00
Mode :character Median :129.0 Median : 872 Median :13.00
Mean :150.7 Mean :1040 Mean :13.18
3rd Qu.:192.0 3rd Qu.:1389 3rd Qu.:17.00
Max. :695.0 Max. :4983 Max. :23.00
NA's :9430
minute time_hour
Min. : 0.00 Min. :2013-01-01 05:00:00
1st Qu.: 8.00 1st Qu.:2013-04-04 13:00:00
Median :29.00 Median :2013-07-03 10:00:00
Mean :26.23 Mean :2013-07-03 05:22:54
3rd Qu.:44.00 3rd Qu.:2013-10-01 07:00:00
Max. :59.00 Max. :2013-12-31 23:00:00